Skip to content

Mdc/add protpardelle#274

Open
marcuscollins wants to merge 12 commits into
mainfrom
mdc/add-protpardelle
Open

Mdc/add protpardelle#274
marcuscollins wants to merge 12 commits into
mainfrom
mdc/add-protpardelle

Conversation

@marcuscollins

@marcuscollins marcuscollins commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

This is a working version of Protpardelle-1c in SampleWorks. It may still require parameter tuning and other updates--in particular it isn't clear how to use self-conditioning. However we can generate structures with this version and so I'm making this PR as a milestone.

Only a couple significant changes have been made outside the Protpardelle wrapper code itself. One is to make a features class that is unfrozen, so that we can pass self-conditioning input forward to the next Euler step during sampling. Another is that we define different sampling parameters for Protpardelle when instantiating the sampler. There is no CLI control for either change in this PR.

Summary by CodeRabbit

  • New Features
    • Added Protpardelle-based structure guidance with model selection and Protpardelle-specific CLI options.
    • Added a new cc89_epoch415 sample configuration.
    • Extended pixi/macOS setup and environments to support the Protpardelle workflow.
  • Bug Fixes
    • Improved guidance runtime feedback by logging the parsed configuration.
    • Updated EDM sampling so the sampled noise is passed into the model step.
  • Chores
    • Updated ignore rules to keep sample data while excluding other data folders.
  • Tests
    • Added comprehensive Protpardelle wrapper and CLI parsing tests, plus adjusted runner/argv-related tests for determinism.

marcuscollins and others added 7 commits June 29, 2026 09:34
Wrap sequence-conditioned all-atom Protpardelle-1c models (task
"ai-allatom", e.g. cc89.yaml) under the StructureModelWrapper protocol.
featurize() derives sequence conditioning (aatype/seq_mask/residue_index/
chain_index/atom37 mask) from an Atomworks structure's chain_info, and
step() runs the full reverse-diffusion trajectory internally via
Protpardelle.sample(gt_aatype=...), returning [batch, atoms, 3] coords.

- Register PROTPARDELLE_AVAILABLE / require_protpardelle /
  check_protpardelle_available (catching OSError from protpardelle.env).
- Add StructurePredictor.PROTPARDELLE.
- Tests build a small random ai-allatom model (no weights needed);
  cover sequence extraction, featurization shapes, prior init, protocol
  conformance, and an end-to-end short-trajectory sampling smoke test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## Summary

Completes the Protpardelle debugging work from ENG-75 / #257 on top of
`mdc/add-protpardelle`.

This PR intentionally targets `mdc/add-protpardelle` so the review
contains only the final Protpardelle fixes, not the whole in-progress
integration branch.

Key changes:

- Restores the EDM sampler/model-wrapper call shape to the
protocol-compatible `step(noisy_state, t_hat, features=features)` after
removing the temporary extra `eps` argument.
- Implements Protpardelle step-time noise handling via
`_expand_noise_level()`, broadcasting scalar or per-batch EDM timesteps
to Protpardelle's expected `B x L` tensor.
- Moves Protpardelle step inputs/conditioning tensors onto the wrapper
device before the model forward pass.
- Keeps `prot_lens_per_chain` on CPU when calling Protpardelle's
sampling helper to avoid CPU/GPU device mismatch in helper-created
residue indices.
- Maps Atomworks selenomethionine atom name `SE` into Protpardelle
atom37's methionine sulfur slot `SD`.
- Fixes `ProtpardelleConditioning` immutability so dataclass
construction can complete before selected conditioning fields become
frozen.
- Allows packaged Protpardelle config/data under
`src/sampleworks/data/**` through `.actlignore` so ACTL sync includes
the runtime YAML config.
- Re-enables the slow `step()` behavior test and adds coverage for the
MSE selenium atom mapping.

## Validation

Local static checks:

- `uvx ty check src/sampleworks/models/protpardelle/wrapper.py
src/sampleworks/core/samplers/edm.py
tests/models/protpardelle/test_protpardelle_wrapper.py`
- `uvx ruff check src/sampleworks/models/protpardelle/wrapper.py
src/sampleworks/core/samplers/edm.py
tests/models/protpardelle/test_protpardelle_wrapper.py`

Remote ACTL / Protpardelle environment checks:

- `pixi run -e protpardelle-dev pytest tests/models/protpardelle -m "not
slow"`
  - `27 passed, 1 deselected`
- `pixi run -e protpardelle-dev pytest
tests/models/protpardelle/test_protpardelle_wrapper.py::TestStep::test_step_returns_coords`
  - `1 passed`
- Exact reduced Protpardelle guidance smoke command from #257 using the
shared checkpoint and 1VME inputs
  - completed successfully with `Guidance run successfully!`
  - wrote results to `output/protpardelle-smoke-guided/`

Non-blocking warnings observed during the smoke run were expected
environment warnings for unavailable optional model/tool paths and
missing mirror environment variables; they did not prevent the
Protpardelle run from completing.

Refs #257.

Co-authored-by: xraymemory <me.anzuoni@gmail.com>
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@marcuscollins, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 9 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eff90a13-ede7-49d4-8367-e7de85db03b0

📥 Commits

Reviewing files that changed from the base of the PR and between 505f824 and 83f2ea6.

⛔ Files ignored due to path filters (1)
  • pixi.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • pyproject.toml
📝 Walkthrough

Walkthrough

Adds Protpardelle support across config, wrapper logic, guidance wiring, availability checks, fixtures, and tests. Also updates Pixi metadata, bundled data handling, and runner-test inputs.

Changes

Protpardelle wrapper and guidance integration

Layer / File(s) Summary
Packaged config and environment setup
.actlignore, pyproject.toml, src/sampleworks/data/cc89_epoch415.yaml
Adds the bundled Protpardelle YAML config, Pixi environment and dependency updates, and .actlignore rules that re-include src/sampleworks/data/.
Availability checks and wrapper import guards
src/sampleworks/utils/imports.py
Adds Protpardelle availability detection, guarded import handling, dedicated decorators and checks, and extends generic model-availability helpers.
Protpardelle wrapper contracts and sampling flow
src/sampleworks/models/protpardelle/__init__.py, src/sampleworks/models/protpardelle/wrapper.py
Defines wrapper constants and dataclasses, sequence extraction, structure annotation, featurization, coordinate conversion, sampling kwargs, noise expansion, step(), and prior initialization.
Guidance CLI and sampler integration
src/sampleworks/utils/guidance_constants.py, src/sampleworks/utils/guidance_script_arguments.py, src/sampleworks/utils/guidance_script_utils.py, src/sampleworks/cli/guidance.py, src/sampleworks/core/samplers/edm.py
Adds the Protpardelle enum and CLI args, wires model construction and config propagation, applies Protpardelle-specific EDM sampler settings, and logs the parsed guidance config.
Protpardelle fixtures and tests
tests/models/protpardelle/conftest.py, tests/models/protpardelle/test_protpardelle_wrapper.py, tests/cli/test_guidance_cli.py
Adds a small in-memory Protpardelle model fixture and tests for sequence extraction, annotation, featurization, coordinate conversion, prior initialization, step execution, and CLI argument parsing.
Runner test input hardening
tests/runs/test_runner.py
Updates runner tests to use smaller GPU-count overrides and relaxes one argv assertion to check the script path by substring.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • k-chrispens

Poem

🐇 I hopped through configs, soft and bright,
With atom37 steps and guidance light.
Protpardelle twirls in a tidy line,
And little tests say, “All is fine!”
Hoppy beams across the data glow ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 48.68% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is related to the change, but it is too vague and reads like a branch label rather than a clear summary. Rename it to a concise, descriptive sentence such as "Add Protpardelle support".
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch mdc/add-protpardelle

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
tests/models/protpardelle/test_protpardelle_wrapper.py (1)

147-160: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Keep these assertions at the public wrapper boundary.

These tests reach into _build_sampling_kwargs() and _convert_to_atom37() directly, which makes the suite brittle to internal refactors that preserve featurize(), initialize_from_prior(), and step() behavior. Either move the coverage to public API outcomes or promote the helpers if they are intended contract. As per coding guidelines, tests/**/*.py: "Write black-box tests that verify behavior, not implementation."

Also applies to: 287-320

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/models/protpardelle/test_protpardelle_wrapper.py` around lines 147 -
160, The tests are asserting private helper behavior via
ProtpardelleWrapper._build_sampling_kwargs() and _convert_to_atom37(), which
makes them brittle to internal refactors. Move this coverage to black-box
assertions around the public wrapper methods featurize(),
initialize_from_prior(), and step(), or explicitly promote those helpers if they
are meant to be part of the contract. Keep the checks focused on observable
outputs and behavior at the wrapper boundary rather than internal implementation
details.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pyproject.toml`:
- Around line 139-141: The pyproject.toml currently defines the
dependency-overrides table twice under tool.pixi.pypi-options, which breaks TOML
parsing. Consolidate the duplicate [tool.pixi.pypi-options.dependency-overrides]
blocks into a single table, and keep the pandas and gemmi pins together there
only if they are intended to apply globally.

In `@src/sampleworks/models/protpardelle/wrapper.py`:
- Around line 707-709: The self-conditioning cached in
features.conditioning.x_self_conditioning is being stored with its autograd
history intact, which can carry the prior graph into later sampling steps.
Update the assignment in step() to cache a detached version of x_self_cond
before writing it to features.conditioning.x_self_conditioning, so repeated
iterations do not retain gradients across steps.

In `@src/sampleworks/utils/guidance_script_utils.py`:
- Around line 223-230: In the Protpardelle branch of guidance_script_utils, the
config_path is still built from a hardcoded src/sampleworks/data path, which
breaks when the package is installed and run from outside the repo. Update the
ProtpardelleWrapper config_path handling to load cc89_epoch415.yaml as a bundled
package resource using the package’s resource-loading APIs instead of resolving
a filesystem path relative to the current working directory. Keep the change
localized to the model-loading logic around StructurePredictor.PROTPARDELLE and
ProtpardelleWrapper.
- Around line 63-67: The Protpardelle fallback in the import block for
ProtpardelleWrapper is too narrow and can still crash non-Protpardelle runs.
Update the try/except around the ProtpardelleWrapper import to catch the
additional failure modes noted in imports.py, not just ImportError, and keep the
existing fallback assignment/logging. Also make the bundled YAML path in the
guidance script resolution independent of the current checkout by resolving it
from the package/module location instead of using a repo-relative path.

In `@src/sampleworks/utils/imports.py`:
- Around line 265-270: The install hint in require_any_model() is stale because
the availability check now includes PROTPARDELLE_AVAILABLE, but the
default_message still only mentions Boltz, Protenix, and RF3. Update the
default_message text in src/sampleworks/utils/imports.py to include Protpardelle
alongside the other supported model options so the decorator’s remediation
guidance matches the current logic.
- Around line 187-230: The bare decorator usage of require_protpardelle is
broken because the function object is being passed into the message parameter,
so the wrapper is not applied. Update require_protpardelle in imports.py to
support both `@require_protpardelle` and `@require_protpardelle`("...") by detecting
when the first argument is a callable vs a custom message, or remove the bare
form from the examples/docstring so only the parenthesized usage is advertised.

In `@tests/models/protpardelle/conftest.py`:
- Around line 17-21: The `os.environ.setdefault(...)` usage is eagerly creating
a temp directory via `tempfile.mkdtemp(...)` even when
`PROTPARDELLE_MODEL_PARAMS` is already set, so update the setup in `conftest.py`
to only call `mkdtemp` when the env var is missing. Apply the same lazy pattern
in `test_protpardelle_wrapper.py` where the same `setdefault`/`mkdtemp` usage
appears, using the existing `PROTPARDELLE_MODEL_PARAMS` guard to avoid
allocating untracked temp dirs on import.

In `@tests/runs/test_runner.py`:
- Around line 29-30: The argv assertion in the test is not hermetic because
runner._build_argv() can switch to a direct Python executable when a baked pixi
env is detected. Update the test setup for the affected assertion in
test_runner.py to force the pixi code path before checking the ["pixi", "run",
"-e", "rf3", "python"] prefix, using the existing test fixture/monkeypatch
around runner._build_argv() so the environment detection cannot change the
expected argv.

---

Nitpick comments:
In `@tests/models/protpardelle/test_protpardelle_wrapper.py`:
- Around line 147-160: The tests are asserting private helper behavior via
ProtpardelleWrapper._build_sampling_kwargs() and _convert_to_atom37(), which
makes them brittle to internal refactors. Move this coverage to black-box
assertions around the public wrapper methods featurize(),
initialize_from_prior(), and step(), or explicitly promote those helpers if they
are meant to be part of the contract. Keep the checks focused on observable
outputs and behavior at the wrapper boundary rather than internal implementation
details.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8a25473f-58ce-43f8-b062-7b353055864f

📥 Commits

Reviewing files that changed from the base of the PR and between 4826154 and 6713208.

⛔ Files ignored due to path filters (1)
  • pixi.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • .actlignore
  • pyproject.toml
  • src/sampleworks/cli/guidance.py
  • src/sampleworks/core/samplers/edm.py
  • src/sampleworks/data/cc89_epoch415.yaml
  • src/sampleworks/models/protpardelle/__init__.py
  • src/sampleworks/models/protpardelle/wrapper.py
  • src/sampleworks/utils/guidance_constants.py
  • src/sampleworks/utils/guidance_script_arguments.py
  • src/sampleworks/utils/guidance_script_utils.py
  • src/sampleworks/utils/imports.py
  • tests/models/protpardelle/__init__.py
  • tests/models/protpardelle/conftest.py
  • tests/models/protpardelle/test_protpardelle_wrapper.py
  • tests/runs/test_runner.py

Comment thread pyproject.toml
Comment thread src/sampleworks/models/protpardelle/wrapper.py Outdated
Comment thread src/sampleworks/utils/guidance_script_utils.py Outdated
Comment thread src/sampleworks/utils/guidance_script_utils.py
Comment on lines +187 to +230
def require_protpardelle(message: str | None = None) -> Callable[[F], F]:
"""Decorator to require Protpardelle model availability.

Parameters
----------
message: str, optional
Custom error message. If None, uses default message.

Returns
-------
Callable
Decorator function

Examples
--------
>>> @require_protpardelle
... def sample_protpardelle():
... pass

>>> @require_protpardelle("Custom error message")
... def custom_function():
... pass
"""
default_message = (
"Protpardelle model wrapper is not available. Install with: "
"pixi install -e protpardelle"
)

def decorator(func: F) -> F:
@functools.wraps(func)
def wrapper(*args: Any, **kwargs: Any) -> Any:
if not PROTPARDELLE_AVAILABLE:
error_msg = message or default_message
try:
import pytest

pytest.skip(error_msg)
except ImportError:
raise ImportError(error_msg) from None
return func(*args, **kwargs)

return wrapper # type: ignore

return decorator

@coderabbitai coderabbitai Bot Jun 29, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

git ls-files src/sampleworks/utils/imports.py
wc -l src/sampleworks/utils/imports.py
sed -n '150,260p' src/sampleworks/utils/imports.py | cat -n

Repository: diff-use/sampleworks

Length of output: 3914


Support the bare decorator form or remove it from the docs. @require_protpardelle passes the function object into message, so the example does not wrap the target function. Either accept both decorator forms or change the examples to require parentheses.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/sampleworks/utils/imports.py` around lines 187 - 230, The bare decorator
usage of require_protpardelle is broken because the function object is being
passed into the message parameter, so the wrapper is not applied. Update
require_protpardelle in imports.py to support both `@require_protpardelle` and
`@require_protpardelle`("...") by detecting when the first argument is a callable
vs a custom message, or remove the bare form from the examples/docstring so only
the parenthesized usage is advertised.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai make an issue to track this as an issue, do not implement it. Let one of the humans implement it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

gh issue create \
  --repo diff-use/sampleworks \
  --title "require_protpardelle decorator does not support bare `@decorator` usage" \
  --assignee marcuscollins \
  --body "## Summary
In \`src/sampleworks/utils/imports.py\`, the \`require_protpardelle\` decorator factory (added around lines 187-230) only supports the parenthesized form \`@require_protpardelle(\"message\")\`. When used as a bare decorator, \`@require_protpardelle\`, the decorated function object is passed in as the \`message\` argument instead of being wrapped, so the function is never actually wrapped/protected. The docstring's first example (\`@require_protpardelle\`) is therefore misleading and does not work as documented.

## Rationale
The same pattern is used for the other \`require_*\` decorators (\`require_boltz\`, \`require_protenix\`, \`require_rf3\`) in the same file, so they likely share this issue, but this report focuses on \`require_protpardelle\` per the originating review comment.

## Affected area
- File: \`src/sampleworks/utils/imports.py\`
- Function: \`require_protpardelle\` (and docstring examples)

## Suggested fix
Either:
1. Update \`require_protpardelle\` to detect when the first positional argument is a callable (bare decorator usage) vs. a string message, and branch accordingly to support both \`@require_protpardelle\` and \`@require_protpardelle(\"message\")\`, or
2. Remove/correct the bare-decorator example in the docstring so only the parenthesized form \`@require_protpardelle()\` is documented/supported.

## Acceptance criteria
- [ ] \`require_protpardelle\` works correctly when used as \`@require_protpardelle\` (bare) or docstring is updated to remove that example.
- [ ] \`require_protpardelle(\"custom message\")\` continues to work as before.
- [ ] Docstring examples accurately reflect supported usage.

## References
- PR: https://github.com/diff-use/sampleworks/pull/274
- Originating review comment: https://github.com/diff-use/sampleworks/pull/274#discussion_r3493412905
- Requested by: `@marcuscollins`"

Length of output: 2450


🏁 Script executed:

#!/bin/bash
set -euo pipefail

gh issue create \
  --repo diff-use/sampleworks \
  --title "require_protpardelle decorator does not support bare `@decorator` usage" \
  --body "## Summary
In \`src/sampleworks/utils/imports.py\`, the \`require_protpardelle\` decorator factory (added around lines 187-230) only supports the parenthesized form \`@require_protpardelle(\"message\")\`. When used as a bare decorator, \`@require_protpardelle\`, the decorated function object is passed in as the \`message\` argument instead of being wrapped, so the function is never actually wrapped/protected. The docstring's first example (\`@require_protpardelle\`) is therefore misleading and does not work as documented.

## Rationale
The same pattern is used for the other \`require_*\` decorators (\`require_boltz\`, \`require_protenix\`, \`require_rf3\`) in the same file, so they likely share this issue, but this report focuses on \`require_protpardelle\` per the originating review comment.

## Affected area
- File: \`src/sampleworks/utils/imports.py\`
- Function: \`require_protpardelle\` (and docstring examples)

## Suggested fix
Either:
1. Update \`require_protpardelle\` to detect when the first positional argument is a callable (bare decorator usage) vs. a string message, and branch accordingly to support both \`@require_protpardelle\` and \`@require_protpardelle(\"message\")\`, or
2. Remove/correct the bare-decorator example in the docstring so only the parenthesized form \`@require_protpardelle()\` is documented/supported.

## Acceptance criteria
- [ ] \`require_protpardelle\` works correctly when used as \`@require_protpardelle\` (bare) or docstring is updated to remove that example.
- [ ] \`require_protpardelle(\"custom message\")\` continues to work as before.
- [ ] Docstring examples accurately reflect supported usage.

## References
- PR: https://github.com/diff-use/sampleworks/pull/274
- Originating review comment: https://github.com/diff-use/sampleworks/pull/274#discussion_r3493412905
- Requested by: `@marcuscollins`"

Length of output: 2226


@marcuscollins, done — I've opened #285 to track this so a human can implement it: #285

Note: I wasn't able to set you as the assignee due to a permissions error on the integration's GraphQL mutation, so you'll need to assign it manually.

🐰📝


🧠 Learnings used
Learnt from: k-chrispens
Repo: diff-use/sampleworks PR: 73
File: src/sampleworks/core/samplers/protocol.py:1-18
Timestamp: 2026-02-02T19:13:51.594Z
Learning: In Python files across the repo, do not suggest or add inline or file-level suppressions for Ruff F722 (jaxtyping dimension strings). Instead rely on a global ignore configured in pyproject.toml (as done by the sampleworks project). Verify that F722 is included in the global ignore, and avoid recommending per-file # ruff: noqa: F722 directives unless explicitly documented as an exception.

Learnt from: marcuscollins
Repo: diff-use/sampleworks PR: 132
File: src/sampleworks/utils/guidance_script_utils.py:585-586
Timestamp: 2026-03-05T16:30:40.514Z
Learning: In Python code, if enums subclassing StrEnum are used (e.g., GuidanceType, StructurePredictor), they serialize to plain strings with json.dump and pickle without special handling. Do not flag these as non-serializable in code reviews. Treat StrEnum values as strings for JSON serialization and ensure tests cover that behavior; no extra pickle handling needed.

Comment thread src/sampleworks/utils/imports.py
Comment on lines +17 to +21
# Must be set before any `import protpardelle...` happens. Respect an
# externally configured directory (e.g. when real weights are available).
os.environ.setdefault(
"PROTPARDELLE_MODEL_PARAMS", tempfile.mkdtemp(prefix="protpardelle_model_params_")
)

@coderabbitai coderabbitai Bot Jun 29, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Avoid allocating a temp dir inside setdefault().

setdefault() evaluates tempfile.mkdtemp(...) eagerly, so this creates an untracked temp directory on every import even when PROTPARDELLE_MODEL_PARAMS is already set. The same pattern appears again in tests/models/protpardelle/test_protpardelle_wrapper.py.

Suggested fix
-os.environ.setdefault(
-    "PROTPARDELLE_MODEL_PARAMS", tempfile.mkdtemp(prefix="protpardelle_model_params_")
-)
+_MODEL_PARAMS_DIR = None
+if "PROTPARDELLE_MODEL_PARAMS" not in os.environ:
+    _MODEL_PARAMS_DIR = tempfile.TemporaryDirectory(
+        prefix="protpardelle_model_params_"
+    )
+    os.environ["PROTPARDELLE_MODEL_PARAMS"] = _MODEL_PARAMS_DIR.name
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Must be set before any `import protpardelle...` happens. Respect an
# externally configured directory (e.g. when real weights are available).
os.environ.setdefault(
"PROTPARDELLE_MODEL_PARAMS", tempfile.mkdtemp(prefix="protpardelle_model_params_")
)
# Must be set before any `import protpardelle...` happens. Respect an
# externally configured directory (e.g. when real weights are available).
_MODEL_PARAMS_DIR = None
if "PROTPARDELLE_MODEL_PARAMS" not in os.environ:
_MODEL_PARAMS_DIR = tempfile.TemporaryDirectory(
prefix="protpardelle_model_params_"
)
os.environ["PROTPARDELLE_MODEL_PARAMS"] = _MODEL_PARAMS_DIR.name
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/models/protpardelle/conftest.py` around lines 17 - 21, The
`os.environ.setdefault(...)` usage is eagerly creating a temp directory via
`tempfile.mkdtemp(...)` even when `PROTPARDELLE_MODEL_PARAMS` is already set, so
update the setup in `conftest.py` to only call `mkdtemp` when the env var is
missing. Apply the same lazy pattern in `test_protpardelle_wrapper.py` where the
same `setdefault`/`mkdtemp` usage appears, using the existing
`PROTPARDELLE_MODEL_PARAMS` guard to avoid allocating untracked temp dirs on
import.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcuscollins Sounds good — thanks for tracking this in #286. I'll leave it as-is for this PR and won't insist further here.


🧠 Learnings used
Learnt from: k-chrispens
Repo: diff-use/sampleworks PR: 73
File: src/sampleworks/core/samplers/protocol.py:1-18
Timestamp: 2026-02-02T19:13:51.594Z
Learning: In Python files across the repo, do not suggest or add inline or file-level suppressions for Ruff F722 (jaxtyping dimension strings). Instead rely on a global ignore configured in pyproject.toml (as done by the sampleworks project). Verify that F722 is included in the global ignore, and avoid recommending per-file # ruff: noqa: F722 directives unless explicitly documented as an exception.

Learnt from: marcuscollins
Repo: diff-use/sampleworks PR: 132
File: src/sampleworks/utils/guidance_script_utils.py:585-586
Timestamp: 2026-03-05T16:30:40.514Z
Learning: In Python code, if enums subclassing StrEnum are used (e.g., GuidanceType, StructurePredictor), they serialize to plain strings with json.dump and pickle without special handling. Do not flag these as non-serializable in code reviews. Treat StrEnum values as strings for JSON serialization and ensure tests cover that behavior; no extra pickle handling needed.

Learnt from: DorisMai
Repo: diff-use/sampleworks PR: 272
File: tests/rewards/test_reward_function_contract.py:39-44
Timestamp: 2026-06-30T19:29:37.684Z
Learning: In this repository’s pytest test suite, do not automatically apply `pytest.mark.slow` just because a test requires a GPU. Use `pytest.mark.gpu` for GPU-dependent tests, and reserve `pytest.mark.slow` for tests that are genuinely slow based on measured per-test runtime (e.g., tests like `tests/rewards/test_reward_function_contract.py` may keep only `pytest.mark.gpu` when they run quickly). If a GPU test is fast, it should not be marked `slow`; if it is slow, add `slow` in addition to `gpu`.

Comment thread tests/runs/test_runner.py
Comment on lines +29 to +30
assert argv[:5] == ["pixi", "run", "-e", "rf3", "python"]
assert "run_grid_search.py" in argv[5] # the exact path varies by invocation

@coderabbitai coderabbitai Bot Jun 29, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Make this argv assertion hermetic against baked env detection.

runner._build_argv() switches to a direct Python executable whenever a baked pixi env exists, so this test can still fail on machines that already have .pixi/envs/rf3/bin/python even though the script-path check is now looser. Force the pixi path in the test setup before asserting the ["pixi", "run", ...] prefix.

Suggested hardening
monkeypatch.setenv("SAMPLEWORKS_FORCE_PIXI", "1")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/runs/test_runner.py` around lines 29 - 30, The argv assertion in the
test is not hermetic because runner._build_argv() can switch to a direct Python
executable when a baked pixi env is detected. Update the test setup for the
affected assertion in test_runner.py to force the pixi code path before checking
the ["pixi", "run", "-e", "rf3", "python"] prefix, using the existing test
fixture/monkeypatch around runner._build_argv() so the environment detection
cannot change the expected argv.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai Please make an issue for this but do not implement it. Please mention @manzuoni-astera and @xraymemory and tag the issue "P1" and "engineering"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
repo="diff-use/sampleworks"

gh issue create \
  --repo "$repo" \
  --title "Harden rf3_partial argv test against baked pixi env detection" \
  --body "## Summary
The argv assertion in \`tests/runs/test_runner.py\` (\`test_argv_for_rf3_partial_matches_bash\`, around Line 29) was relaxed to check the \`run_grid_search.py\` script path by substring instead of an exact match. However, \`runner._build_argv()\` switches to a direct Python executable whenever a baked pixi env (e.g. \`.pixi/envs/rf3/bin/python\`) is detected on the machine running the tests. This means the test can still fail non-deterministically on machines/CI runners that already have a baked pixi env, even with the looser substring check, because the \`[\"pixi\", \"run\", -e\", \"rf3\", \"python\"]\` prefix assertion assumes the pixi code path is always taken.

## Required change
- In \`tests/runs/test_runner.py\`, force the pixi code path before asserting the \`[\"pixi\", \"run\", \"-e\", \"rf3\", \"python\"]\` prefix, e.g. via \`monkeypatch.setenv(\"SAMPLEWORKS_FORCE_PIXI\", \"1\")\` in \`test_argv_for_rf3_partial_matches_bash\`.
- Verify no other tests in the same file rely on the default (non-forced) baked-env detection behavior in a way that would be affected.

## Affected files
- \`tests/runs/test_runner.py\`

## Acceptance criteria
- \`test_argv_for_rf3_partial_matches_bash\` passes deterministically regardless of whether a baked pixi env exists on the test-running machine.
- No regression to other runner tests that intentionally exercise baked-env detection (e.g. \`test_uses_baked_env_python_when_available\`).

## References
- PR: https://github.com/diff-use/sampleworks/pull/274
- Discussion: https://github.com/diff-use/sampleworks/pull/274#discussion_r3493412921

Requested by: `@marcuscollins`

cc: `@manzuoni-astera` `@xraymemory`" \
  --label "P1" \
  --label "engineering"

Length of output: 2092


@marcuscollins Done — I've opened #287 to track this without implementing it.

  • Title: "Harden rf3_partial argv test against baked pixi env detection"
  • Labels: P1, engineering
  • Mentions: @manzuoni-astera and @xraymemory
  • Links back to this PR and discussion comment for context

🐇📝


🧠 Learnings used
Learnt from: k-chrispens
Repo: diff-use/sampleworks PR: 73
File: src/sampleworks/core/samplers/protocol.py:1-18
Timestamp: 2026-02-02T19:13:51.594Z
Learning: In Python files across the repo, do not suggest or add inline or file-level suppressions for Ruff F722 (jaxtyping dimension strings). Instead rely on a global ignore configured in pyproject.toml (as done by the sampleworks project). Verify that F722 is included in the global ignore, and avoid recommending per-file # ruff: noqa: F722 directives unless explicitly documented as an exception.

Learnt from: marcuscollins
Repo: diff-use/sampleworks PR: 132
File: src/sampleworks/utils/guidance_script_utils.py:585-586
Timestamp: 2026-03-05T16:30:40.514Z
Learning: In Python code, if enums subclassing StrEnum are used (e.g., GuidanceType, StructurePredictor), they serialize to plain strings with json.dump and pickle without special handling. Do not flag these as non-serializable in code reviews. Treat StrEnum values as strings for JSON serialization and ensure tests cover that behavior; no extra pickle handling needed.

Learnt from: DorisMai
Repo: diff-use/sampleworks PR: 272
File: tests/rewards/test_reward_function_contract.py:39-44
Timestamp: 2026-06-30T19:29:37.684Z
Learning: In this repository’s pytest test suite, do not automatically apply `pytest.mark.slow` just because a test requires a GPU. Use `pytest.mark.gpu` for GPU-dependent tests, and reserve `pytest.mark.slow` for tests that are genuinely slow based on measured per-test runtime (e.g., tests like `tests/rewards/test_reward_function_contract.py` may keep only `pytest.mark.gpu` when they run quickly). If a GPU test is fast, it should not be marked `slow`; if it is slow, add `slow` in addition to `gpu`.

## Summary
- add Protpardelle-specific CLI flags for config path and sampling
controls
- forward config path into ProtpardelleWrapper construction
- forward sampling options into annotate_structure_for_protpardelle
- keep guidance --help lightweight by lazy-loading runtime dependencies

Fixes #258

## Validation
- uvx ty check src/sampleworks/utils/guidance_script_arguments.py
src/sampleworks/utils/guidance_script_utils.py
src/sampleworks/cli/guidance.py tests/cli/test_guidance_cli.py
tests/models/protpardelle/test_protpardelle_wrapper.py
- uvx ruff check src/sampleworks/utils/guidance_script_arguments.py
src/sampleworks/utils/guidance_script_utils.py
src/sampleworks/cli/guidance.py tests/cli/test_guidance_cli.py
tests/models/protpardelle/test_protpardelle_wrapper.py
- PYTHONPATH=src uvx --python 3.12 --with pytest pytest --noconftest
tests/cli/test_guidance_cli.py -q
- PYTHONPATH=src uvx --python 3.12 --with pytest pytest --noconftest
tests/models/protpardelle/test_protpardelle_wrapper.py -q (skipped
locally: Protpardelle not installed)

---------

Co-authored-by: Marcus D. Collins <marcus.collins@astera.org>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/sampleworks/core/samplers/edm.py (1)

416-428: 🗄️ Data Integrity & Integration | 🔴 Critical | 🏗️ Heavy lift

model_wrapper.step() now breaks non-Protpardelle wrappers

src/sampleworks/core/samplers/edm.py:425-428 passes eps as a third positional argument, but Boltz, Protenix, and RF3 still define step(x_t, t, *, features=...). That will raise TypeError on those sampling paths. Update the protocol and wrapper signatures together, or gate this call to the Protpardelle-only implementation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/sampleworks/core/samplers/edm.py` around lines 416 - 428, The
`model_wrapper.step()` call in the EDM sampler is now passing `eps`
positionally, which breaks wrappers like `Boltz`, `Protenix`, and `RF3` that
still only accept `step(x_t, t, *, features=...)`. Fix this by updating the
shared protocol and the wrapper implementations together so their `step`
signatures match the new call, or by restricting the extra `eps` argument to the
Protpardelle-specific path. Check the `model_wrapper.step()` invocation in the
EDM sampler and the corresponding `step` methods on the wrapper classes to keep
the interface consistent.
🧹 Nitpick comments (1)
src/sampleworks/cli/guidance.py (1)

13-16: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Redundant local logger import duplicates the new top-level import.

A top-level from loguru import logger is already added at the module level (lines 6-7), making this local re-import inside main() dead code. Lazy-loading get_model_and_device/run_guidance to keep --help light makes sense, but loguru is not a heavy dependency, so duplicating the import adds noise without benefit.

♻️ Proposed fix
 def main(argv: list[str] | None = None) -> int:
     config = GuidanceConfig.from_cli(argv)
 
-    from loguru import logger
-
     from sampleworks.utils.guidance_script_utils import get_model_and_device, run_guidance
 
     logger.info(f"Running guidance with config: {config}")

As per coding guidelines, "No dead code and no compatibility shims for hypothetical users."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/sampleworks/cli/guidance.py` around lines 13 - 16, The local re-import of
logger inside main() is redundant because logger is already imported at module
scope. Remove the inner from loguru import logger from main() and keep the lazy
import only for get_model_and_device and run_guidance so guidance.py uses the
existing top-level logger symbol consistently.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/sampleworks/core/samplers/edm.py`:
- Around line 416-428: The `model_wrapper.step()` call in the EDM sampler is now
passing `eps` positionally, which breaks wrappers like `Boltz`, `Protenix`, and
`RF3` that still only accept `step(x_t, t, *, features=...)`. Fix this by
updating the shared protocol and the wrapper implementations together so their
`step` signatures match the new call, or by restricting the extra `eps` argument
to the Protpardelle-specific path. Check the `model_wrapper.step()` invocation
in the EDM sampler and the corresponding `step` methods on the wrapper classes
to keep the interface consistent.

---

Nitpick comments:
In `@src/sampleworks/cli/guidance.py`:
- Around line 13-16: The local re-import of logger inside main() is redundant
because logger is already imported at module scope. Remove the inner from loguru
import logger from main() and keep the lazy import only for get_model_and_device
and run_guidance so guidance.py uses the existing top-level logger symbol
consistently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f581f2ae-9bd1-4cb2-9dbc-72bd9be16919

📥 Commits

Reviewing files that changed from the base of the PR and between 6713208 and 4ac3e5b.

📒 Files selected for processing (7)
  • src/sampleworks/cli/guidance.py
  • src/sampleworks/core/samplers/edm.py
  • src/sampleworks/models/protpardelle/wrapper.py
  • src/sampleworks/utils/guidance_script_arguments.py
  • src/sampleworks/utils/guidance_script_utils.py
  • tests/cli/test_guidance_cli.py
  • tests/models/protpardelle/test_protpardelle_wrapper.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/sampleworks/utils/guidance_script_arguments.py
  • src/sampleworks/utils/guidance_script_utils.py
  • tests/models/protpardelle/test_protpardelle_wrapper.py
  • src/sampleworks/models/protpardelle/wrapper.py

@k-chrispens k-chrispens left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw some of the recent issues as I left work today and got sucked into reading the rest of this PR. If this is ready for review, these are my thoughts. The most important thing is that it doesn't appear that the modelwrapper step function is aligned with how it is called in the sampler. It would also be good if the protpardelle wrapper got plugged into the other modelwrapper integration tests in tests/conftest.py - I think several of these tests would fail in the current implementation

getattr(config, "model_checkpoint", None),
config.model,
method=getattr(config, "method", None),
protpardelle_config_path=getattr(config, "protpardelle_config_path", None),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can remove this here, or at least make it more general rather than including a model specific arg here? I know there are a lot of models that use hydra YAML configs, so I expect we will probably need to generalize this at some point

# I need to modify the Protocol itself. @Michael Anzuoni
with torch.set_grad_enabled(allow_gradients):
x_hat_0 = model_wrapper.step(noisy_state, t_hat, features=features)
x_hat_0 = model_wrapper.step(noisy_state, t_hat, eps, features=features)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused here, because this no longer appears to be needed in the model step function that is in this PR. So I think #283 should be closed.

@@ -0,0 +1,766 @@
"""Wrapper for Protpardelle-1c models.

Follows the ``StructureModelWrapper`` protocol in

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct - this shouldn't be following the StructureModelWrapper. It should be (and I believe is, though that would require an isinstance check) a FlowModelWrapper

models (``task: "ai-allatom"``, e.g. the model defined by ``cc89.yaml``).
These models take a protein sequence as their only conditioning input and
generate an all-atom structure by running the full reverse-diffusion
trajectory internally. Because a single ``step`` call encapsulates the entire

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also confused by this - the step call shouldn't be encapsulating the whole diffusion loop (and currently isn't)


# To understand all these arguments better, you can study the (complicated)
# Protpardelle.sample method. Hints: partial diffusion is enabled and cc.enabled = False!
# https://github.com/ProteinDesignLab/protpardelle-1c/src/protpardelle/core/models.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default=1.0,
help="Temperature applied to Protpardelle aatype logits",
)
parser.add_argument(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need this since we will never be doing design, right?

structure,
ensemble_size=args.ensemble_size,
num_steps=args.num_diffusion_steps,
s_churn=getattr(args, "protpardelle_s_churn", 40.0),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine several of these are not going to be adjusted - not that we want to restrict the number of knobs, but especially the design arguments don't seem relevant to include here. Also, several of these are not consumed anywhere aside from in the annotation on the structure.

str(template_job.device),
template_job.model_checkpoint,
template_job.model,
template_job.model, # this is not actually the model, it's the model name, e.g. boltz2

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just rename this to model_name?

atoms implied by the input sequence (the ``atom_mask`` in the
conditioning).

Parameters

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring isn't correct

self,
x_t: Float[Tensor, "batch atoms 3"],
t: Float[Tensor, "*batch"] | float,
sigma_float: float,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused parameter that should be removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants