Describe the feature or problem you'd like to solve
When a Copilot CLI agent invokes a skill, the resulting tool calls (bash, glob, etc.) are emitted as flat children of the root invoke_agent span with no skill attribution. There is no intermediate span or attribute linking tool calls to the skill that triggered them, making it impossible to measure skill-level latency, debug skill failures, or build per-skill dashboards from OTEL trace data.
Proposed solution
Introduce an execute_skill (or similar) span that:
- Wraps the tool calls triggered by the skill invocation, so they become children of the skill span rather than the root agent span.
- Carries a
skill.name attribute identifying which skill was executed.
- Is a child of the
invoke_agent span, preserving the existing hierarchy.
Example desired trace:
invoke_agent (root)
├── chat model
├── execute_skill "my-skill" ← NEW
│ ├── execute_tool glob ← child of skill span
│ ├── execute_tool bash ← child of skill span
│ └── execute_tool bash ← child of skill span
├── chat model
└── execute_tool bash ← not part of a skill
Alternatively (minimal version): if nesting is not feasible, adding a skill.name attribute to each execute_tool span that was triggered within a skill invocation context would also solve the problem.
Example prompts or workflows
- Skill-level latency measurement: Users can measure how long a skill takes end-to-end, rather than manually summing individual tool call durations.
- Tool call attribution: Clearly distinguish which tool calls belong to a skill invocation vs. general agent reasoning, enabling targeted debugging and optimization.
- Faster failure diagnosis: When a skill fails, users can immediately identify which child tool call failed without reading through command arguments.
- Aggregated dashboards: Teams can build per-skill usage and performance dashboards across sessions, which is essential for monitoring custom skill reliability at scale.
Example prompts or workflows
-
Debugging a failed skill: A custom skill fails intermittently. The user exports OTEL traces and filters for execute_skill spans with error status. They drill into the child execute_tool spans to see exactly which bash command failed — without reading every tool call in the session.
-
Measuring skill performance over time: A team ships a custom skill and wants to track its p50/p95 latency across sessions. They query their trace backend for execute_skill spans where skill.name = "my-skill" and chart duration over time. Today this is impossible without manually parsing tool call arguments.
-
Attributing token/tool usage to skills: A user runs a session where the agent invokes three different skills. They want to see how many tool calls each skill made and how much time each consumed. With skill-level spans, this is a simple trace query. Without them, all tool calls are indistinguishable siblings under invoke_agent.
-
Building an observability dashboard: A team sends OTEL traces to their backend and builds a dashboard showing skill invocation frequency, success rate, and latency. This requires a reliable skill.name attribute or dedicated span — inferring skill boundaries from bash command content is fragile and breaks when scripts change.
-
Auditing skill usage in CI/automation: In cloud agent jobs, a team wants to verify that the correct skills were invoked and completed successfully. Skill-level spans would make this a simple trace query rather than log parsing.
Additional context
- Hooks (
preToolUse/postToolUse) also don't carry skill context — toolName is "bash" / "glob" etc., with no reference to the parent skill. Adding skillName to hook payloads would be a complementary improvement.
- The
skill tool invocation itself may appear as an execute_tool skill span, but subsequent tool calls triggered by the skill are not linked to it via parentSpanId.
- Observed on Copilot CLI v1.0.60.
Describe the feature or problem you'd like to solve
When a Copilot CLI agent invokes a skill, the resulting tool calls (bash, glob, etc.) are emitted as flat children of the root invoke_agent span with no skill attribution. There is no intermediate span or attribute linking tool calls to the skill that triggered them, making it impossible to measure skill-level latency, debug skill failures, or build per-skill dashboards from OTEL trace data.
Proposed solution
Introduce an
execute_skill(or similar) span that:skill.nameattribute identifying which skill was executed.invoke_agentspan, preserving the existing hierarchy.Example desired trace:
Alternatively (minimal version): if nesting is not feasible, adding a
skill.nameattribute to eachexecute_toolspan that was triggered within a skill invocation context would also solve the problem.Example prompts or workflows
Example prompts or workflows
Debugging a failed skill: A custom skill fails intermittently. The user exports OTEL traces and filters for
execute_skillspans with error status. They drill into the childexecute_toolspans to see exactly which bash command failed — without reading every tool call in the session.Measuring skill performance over time: A team ships a custom skill and wants to track its p50/p95 latency across sessions. They query their trace backend for
execute_skillspans whereskill.name = "my-skill"and chart duration over time. Today this is impossible without manually parsing tool call arguments.Attributing token/tool usage to skills: A user runs a session where the agent invokes three different skills. They want to see how many tool calls each skill made and how much time each consumed. With skill-level spans, this is a simple trace query. Without them, all tool calls are indistinguishable siblings under
invoke_agent.Building an observability dashboard: A team sends OTEL traces to their backend and builds a dashboard showing skill invocation frequency, success rate, and latency. This requires a reliable
skill.nameattribute or dedicated span — inferring skill boundaries from bash command content is fragile and breaks when scripts change.Auditing skill usage in CI/automation: In cloud agent jobs, a team wants to verify that the correct skills were invoked and completed successfully. Skill-level spans would make this a simple trace query rather than log parsing.
Additional context
preToolUse/postToolUse) also don't carry skill context —toolNameis"bash"/"glob"etc., with no reference to the parent skill. AddingskillNameto hook payloads would be a complementary improvement.skilltool invocation itself may appear as anexecute_tool skillspan, but subsequent tool calls triggered by the skill are not linked to it viaparentSpanId.