Skip to content

Rework the Python worker process spec (#378)#423

Draft
intendednull wants to merge 19 commits into
piercefreeman:mainfrom
intendednull:worker-python-spec-rework
Draft

Rework the Python worker process spec (#378)#423
intendednull wants to merge 19 commits into
piercefreeman:mainfrom
intendednull:worker-python-spec-rework

Conversation

@intendednull

@intendednull intendednull commented May 29, 2026

Copy link
Copy Markdown

[WIP] will mark when ready for final review, but any feedback is appreciated

Moves all of the fallible I/O from config resolution to happen before worker spawn

Fixes #378

Address review findings on the piercefreeman#378 rework (all behavior-preserving):
- Re-log the resolved PYTHONPATH: once at prepare() time (with working_dir
  and whether the bundled package root or cwd was used) and on the per-spawn
  line, restoring the operability signal dropped in the rework.
- Derive Clone + Debug on PreparedConfig and Spec for parity with Config and
  to allow debug-logging resolved launch values.
- Document that the environment is snapshotted at prepare() and not re-read
  on worker recycle, and that a prepare_blocking panic is surfaced as
  PrepareError::Join rather than resumed.
- Extract the payload-wait + timeout-to-SpawnError mapping into a
  Timeouts::await_payload method; spawn() now calls it directly.
- Simplify is_executable_file to a match with an is_file guard.
- Pull the unix exec-bit test into a named is_executable helper with a
  named EXECUTABLE_BITS constant (no std is_executable exists).
Generalize the lessons from the piercefreeman#378 worker-python rework into the
AGENTS.md code_feedback guide, per its Code Review process: keep blocking
IO off the async hot path (materialize once via prepare()/spawn_blocking),
and don't panic on infallible trait paths (return a typed error from an
async constructor instead).
@MOZGIII

MOZGIII commented May 29, 2026

Copy link
Copy Markdown
Collaborator

Thanks! I'll review this is a couple days, a bit busy currently.

@MOZGIII MOZGIII self-requested a review May 29, 2026 07:54

@MOZGIII MOZGIII left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A but preliminary, as I didn't look too deep, but nonetheless some feedback. It seems like this PR also changes timeouts - I suggest we restrict this PR to python worker refactors only, as to avoid deliberation on the timeouts structure and etc; if possible of course.

@intendednull

Copy link
Copy Markdown
Author

yes that sounds good to me, thanks for the clarification

Reviewer (MOZGIII, piercefreeman#423) asked to restrict this PR to python worker
refactors only and defer the timeout-structure rework: "this PR also
changes timeouts - I suggest we restrict this PR to python worker
refactors only, as to avoid deliberation on the timeouts structure".

- Revert worker-process and worker-process-pool back to main (the
  ShutdownParams->Timeouts rename and await_payload extraction are gone).
- worker-python keeps its refactor (PreparedConfig caching, panic-free
  prepare_spawn_params, symlink/exec-aware find_executable) but no longer
  owns a timeouts field; prepare_spawn_params emits the existing hardcoded
  SpawnParams/ShutdownParams shape, identical to main.

@intendednull intendednull left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude looks like the config revert clobbered a few lines. lets make sure the reversion is clean, no trace left behind

Comment thread crates/lib/worker-python/src/config.rs Outdated
Comment thread crates/lib/worker-python/src/lib.rs Outdated
Comment thread crates/lib/worker-python/src/lib.rs Outdated
Comment thread crates/lib/worker-python/src/lib.rs Outdated
Comment thread AGENTS.md Outdated
- config.rs: revert cosmetic trailing-period doc edits on the three
  unchanged Config fields (diff clutter, no semantic change).
- lib.rs: trim the Spec doc to what it is and rely on the PreparedConfig
  link; restore the timeout block (comment + Duration::from_secs values)
  to match main byte-for-byte; re-import Duration instead of fully
  qualifying.
- AGENTS.md: reword the panic-free rule per review wording.

@intendednull intendednull left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small organizational nit. Please mark resolved when done

Comment thread crates/lib/worker-python/src/config.rs Outdated
Comment thread crates/lib/worker-python/src/config.rs
config.rs is now a pure structural definition of Config and its
builders. The resolution work — default_runner and the PATH
executable-lookup family (find_executable/is_executable_file/
is_executable) plus their tests — moves next to its only consumer,
Config::prepare in prepared.rs.

Addresses the two organizational review comments on PR piercefreeman#423.
Comment thread crates/lib/worker-python/src/spec_factory.rs
prepared.rs is an extension of the config definition, not a peer
module. Nest it as config::prepared using the project's file+folder
layout (src/config.rs + src/config/prepared.rs) and re-export
PreparedConfig/PrepareError through config. Crate-root public API is
unchanged.

Addresses PR piercefreeman#423 review comment.
main accepted any is_file(); this crate added the 0o111 exec-bit
check. Assert is_executable_file rejects the crate's own (non-exec)
Cargo.toml so that fix is covered. Seam-free, no temp file.
@piercefreeman

Copy link
Copy Markdown
Owner

@intendednull Thanks for the PR! Waymark's still a really new project and we haven't yet formalized our contributor or AI guidelines yet. Will circle back to this ticket once we get that nailed.

Both variants carried the cause only via #[source], so Display-only
call sites (smoke's eprintln) printed the context line with no
underlying io::Error or JoinError detail. Sibling worker crates
(managed-process, worker-process, worker-remote-bringup) all embed
the source in the message; match that.
The comment referenced `main`'s old is_file()-only behavior, which
stops being true the moment this branch merges into main. State the
invariant under test instead.
Worker launch config was resolved by Config::prepare(), welding host
probing onto the declarative config type, and each of the three bins
repeated the same prepare-then-wire dance.

Split the two concerns:
- Resolution is now a free `resolve(Config) -> SpecFactory` whose probe
  runs once under spawn_blocking; Config is inert declarative data.
- `SpecFactory` (was `PreparedConfig`) is named for its job: `build(addr)`
  mints a `Spec` once the late-bound bridge address is known. `Spec`
  fields are private, constructible only via the factory.
- New `worker-python-bringup` crate owns the compose; each bin collapses
  to a single `start(...)` call. `worker-remote-bringup` stays generic
  and python-free.

`PrepareError` -> `ResolveError`; `config/prepared.rs` moves to top-level
`resolve.rs` (resolution is no longer an extension of config). AGENTS.md
rule examples updated to the new names.

@intendednull intendednull left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude minor adjustments, please fix

Comment thread crates/lib/worker-python/src/lib.rs Outdated
Comment on lines +36 to +58
let factory = &self.factory;

tracing::info!(python_path = %python_path, ?reservation_id, "configured python path for worker");

// Build the command
let mut command = tokio::process::Command::new(&self.config.script_path);
command.args(&self.config.script_args);
let mut command = tokio::process::Command::new(&factory.program);
command.args(&factory.args);
command
.arg("--bridge")
.arg(self.bridge_server_addr.to_string())
.arg("--worker-id")
.arg(reservation_id.to_string());

// Add user modules
for module in &self.config.user_modules {
for module in &factory.user_modules {
command.arg("--user-module").arg(module);
}

command.env("PYTHONPATH", python_path);

if let Some(dir) = working_dir {
tracing::info!(?dir, "using package root for worker process");
command.current_dir(dir);
} else {
// TODO: move this fallible initialization outside of this impl.
let cwd = std::env::current_dir().expect("failed to resolve current directory");
tracing::info!(
?cwd,
"package root missing, using current directory for worker process"
);
command.current_dir(cwd);
}
command.env("PYTHONPATH", &factory.python_path);
command.current_dir(&factory.working_dir);

tracing::info!(
?reservation_id,
working_dir = %factory.working_dir.display(),
python_path = %factory.python_path,
"prepared python worker spawn params"
);

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are doing SpecFactory -> Command. Lets factor this out. It could either be a standalone fn, or maybe it would make more sense to attach to SpecFactory. Lets follow existing patterns here, whatever practice is favored in this codebase. I'm leaning towards standalone, so lets just do that for now if unclear.

Comment thread crates/lib/worker-python/src/spec_factory.rs
Comment thread crates/lib/worker-python/src/resolve.rs Outdated

/// The blocking half of [`resolve`]. Must only be called from a blocking
/// context (e.g. inside `spawn_blocking`).
fn probe(config: Config) -> Result<SpecFactory, ResolveError> {

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn probe(config: Config) -> Result<SpecFactory, ResolveError> {
fn resolve_blocking(config: Config) -> Result<SpecFactory, ResolveError> {

Per PR piercefreeman#423 review on the resolve/SpecFactory rework:
- Rename the module file resolve.rs -> spec_factory.rs (named for the
  type it defines, not the verb).
- Rename the private blocking half probe() -> resolve_blocking() to
  pair with the async resolve().
- Extract the SpecFactory -> Command assembly out of
  prepare_spawn_params into a standalone build_command(); the spawn
  log stays in prepare_spawn_params so build_command is a pure builder.
The pre-rework Spec exposed its fields publicly; keep that visibility
rather than sealing construction behind a pub(crate) constructor. Drop
Spec::new and have SpecFactory::build construct the struct directly.
With Spec's fields public again, build() was a trivial constructor that
direct struct construction expresses more clearly (the house style; the
pre-rework code built Spec the same way). The bringup closure now
constructs `Spec { bridge_server_addr, factory }` directly, and
SpecFactory is pure resolved data with no methods.
Polish only the docs this PR added or changed; pre-existing docs carried
from main are left untouched.

- resolve(): add the `# Errors` section the fallible-fn docs use.
- is_executable_file/is_executable: imperative mood, matching the file.
- build_command: plain intra-doc link form for the Command type.
- worker-python-bringup: add a module doc, document StartError + its
  variants, and capitalize Python in the start() doc.
- config.rs script_path: intra-doc-link resolve.
- AGENTS.md: the runtime-IO rule cited var("PATH"); the code uses
  var_os("PATH").
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rework how the Python worker process spec is implemented

3 participants