feat(orchestrator): implement skill materialization during agent spawn#1229
feat(orchestrator): implement skill materialization during agent spawn#1229geoffjay wants to merge 2 commits into
Conversation
- Add MaterializeResult struct (written/skipped/not_found) - Add materialize_skills() async fn in skills.rs: writes skill content to <working_dir>/.claude/skills/<name>/SKILL.md before agent launch - Skips files that already exist (agent-local skills take precedence) - Integrate into spawn_agent() after session create, before build_claude_command - Missing skills produce warning, not hard failure - Logs written/skipped counts at info level, missing at warn level Closes #1211
geoffjay
left a comment
There was a problem hiding this comment.
Review: feat(orchestrator): implement skill materialization during agent spawn
Stack Position
Branch issue-1211 is stacked on issue-1210 (PR #1228), which is itself stacked on epic-1209 (PR #1226). Full chain: main ← epic-1209 ← issue-1210 ← issue-1211.
This PR cannot merge until #1226 and #1228 both clear. Status at review time:
- PR #1226: re-submitted for re-review (
review-agentlabel set) — blocking - PR #1228: reviewed with comment,
merge-queueheld pending #1226
The code in this PR is reviewed in isolation below.
Non-Blocking: target_file.exists() Is Synchronous in an Async Function
materialize_skills is async and uses tokio::fs for all other I/O, but the "skip if already exists" check uses the blocking std::path::Path::exists():
// skills.rs ~line 104 — synchronous stat call in async context:
if target_file.exists() {
result.skipped.push(name.clone());
continue;
}
// vs. the async calls immediately below:
tokio::fs::create_dir_all(&target_dir).await?;
tokio::fs::write(&target_file, &skill.content).await?;For the typical .agentd/skills/ directory this is negligible, but it's an inconsistency that could mask problems on slow filesystems or network mounts. Consistent fix:
if tokio::fs::try_exists(&target_file).await.unwrap_or(false) {
result.skipped.push(name.clone());
continue;
}try_exists is preferred over metadata().is_ok() because it returns Ok(false) for a missing file rather than Err, and the .unwrap_or(false) correctly falls through to a write attempt if the stat itself fails.
Non-Blocking: I/O Write Errors Land in not_found
MaterializeResult has three buckets: written, skipped, not_found. In materialize_skills, the not-found path is correct — skill names with no matching entry in discovered_skills go to not_found. But the ? propagation on create_dir_all and write would bubble the error up and abort the entire call, not route to not_found.
More importantly, if a future refactor changes the write errors to be swallowed and categorized, they'd naturally fall into not_found — but a skill that was found and failed to write is categorically different from a skill that was never discovered. A failed field would make this distinction explicit and is cheap to add:
#[derive(Debug, Default, PartialEq)]
pub struct MaterializeResult {
pub written: Vec<String>,
pub skipped: Vec<String>,
pub not_found: Vec<String>,
pub failed: Vec<String>, // found but couldn't be written
}The integration in manager.rs already logs not_found names as "skills not found" — having a separate failed vector would give operators the right signal when skill files exist but can't be materialized (permissions, disk full, etc.).
Non-Blocking: discover_all_skills() Is Synchronous Inside Async spawn_agent
spawn_agent in manager.rs is async, but discover_all_skills() uses std::fs::read_dir throughout. The call is:
let discovered = crate::skills::discover_all_skills(); // blocking sync
match crate::skills::materialize_skills(...).await {Same observation as PR #1226 review — for the typical small skills directory this is fine in practice. Worth noting as a future optimization (pass discovered skills in from the caller, or spawn via tokio::task::spawn_blocking). Not worth blocking the PR on.
What's Working Well
- Soft failure is correct:
Err(e) => { warn!(...); }— agent still starts even if skill materialization fails. This is the right tradeoff; a missing skill should not prevent an agent from launching. - Guard avoids unnecessary work:
if !agent.config.skills.is_empty()skips both discovery and materialization when no skills are requested. - Placement in
spawn_agent: materialization aftercreate_sessionbut beforebuild_claude_commandis correct — the working directory exists at that point, and skills must be present before the subprocess starts. - Target path convention:
.claude/skills/<name>/SKILL.mdis consistent with the Claude Code skill directory layout. - Tracing fields:
%result.written.len(),%result.skipped.len(),%result.not_found.len()use%(Display) for counts and?(Debug) for the string vecs — correct. HashMapindex inmaterialize_skills: building aname → &Skillmap before iteratingskill_namesis O(n) rather than O(n²) — correct choice.MaterializeResultderives:Default+PartialEqare the right set; noCloneneeded since the caller doesn't retain it.
Summary
Two non-blocking observations: the target_file.exists() sync/async inconsistency (swap for tokio::fs::try_exists) and the missing failed bucket in MaterializeResult for write errors distinct from discovery failures. The synchronous discover_all_skills() in an async context is a pre-existing pattern noted in #1226 and tracked separately.
The integration is correct and the soft-failure behavior is the right call. This PR is waiting on #1226 → #1228 to clear before it can merge; no action needed here beyond that.
Adds
materialize_skills()to write skill files into an agent's.claude/skills/directory before the Claude process is launched.Changes:
MaterializeResultstruct tracks which skills were written, skipped (existing file), or not foundmaterialize_skills()writes<working_dir>/.claude/skills/<name>/SKILL.mdfor each assigned skillspawn_agent()between session creation andbuild_claude_command()warn!log, not a hard failure — agent still launchesWorktree approach: Skills are written to the source
working_dirpre-launch. Because.claude/is typically in.gitignore, worktree agents discover skills via the parent directory already wired throughadditional_dirs.Closes #1211