Skip to content

invoke_agent wrapper span omits tool-call turns and gen_ai.tool.definitions, violating GenAI semconv #172

@ninghu

Description

@ninghu

Summary

microsoft-opentelemetry 1.3.0 emits the wrapper invoke_agent <agent-name> span with gen_ai.input.messages / gen_ai.output.messages in the new spec-compliant {role, parts:[{type, content}]} shape (great — that closed #159), but the message list only contains:

  • the first user prompt, and
  • the final assistant text reply.

It does not contain the intermediate assistant tool_call turns or the tool-role tool_call_response turns, and the span does not carry gen_ai.tool.definitions. All of that data is correctly emitted on child spans (execute_tool <name> and the inner chat <model> spans), but is never aggregated onto the wrapper invoke_agent span.

Why this matters

Consumers that read tool usage off the invoke_agent span see an empty tool-interaction history. Concrete impact: Azure AI Foundry's cloud trace-evaluation pipeline builds its tool_calls / tool_definitions per item from this span only — so:

  • builtin.task_adherence errors "no tool interactions provided" on every item.
  • builtin.intent_resolution mis-scores any prompt whose resolution depended on tool turns.

Spec gap

Per the OpenTelemetry GenAI semantic conventions (now in open-telemetry/semantic-conventions-genai), docs/gen-ai/gen-ai-agent-spans.md defines gen_ai.invoke_agent.internal (and .client) with this normative wording for gen_ai.input.messages:

Instrumentations MUST follow Input messages JSON schema. … Messages MUST be provided in the order they were sent to the model.

The schema's Role enum is system | user | assistant | tool, and ChatMessage.parts allows ToolCallRequestPart (type:"tool_call") and ToolCallResponsePart (type:"tool_call_response"). The spec's own example for gen_ai.input.messages on invoke_agent is exactly the missing pattern:

[
  {"role": "user", "parts": [{"type": "text", "content": "Weather in Paris?"}]},
  {"role": "assistant", "parts": [{
    "type": "tool_call",
    "id": "call_VSPygqKTWdrhaFErNvMV18Yl",
    "name": "get_weather",
    "arguments": {"location": "Paris"}
  }]},
  {"role": "tool", "parts": [{
    "type": "tool_call_response",
    "id": "call_VSPygqKTWdrhaFErNvMV18Yl",
    "result": "rainy, 57°F"
  }]}
]

The spec also recognizes gen_ai.tool.definitions as an opt-in attribute on the invoke_agent span (Tool Definitions JSON schema).

Reproduction

  1. Wrap a LangGraph create_react_agent with two tools (get_current_weather, get_forecast) using microsoft-opentelemetry==1.3.0.
  2. Send a prompt that forces multiple tool calls, e.g. "Which is warmer right now: NYC, London, or Tokyo? Use the weather tool to check.".
  3. Query App Insights for the wrapper span:
    dependencies
    | where name startswith "invoke_agent "
    | project customDimensions
  4. Observe gen_ai.input.messages contains a single user message, gen_ai.output.messages contains a single assistant text, and no gen_ai.tool.definitions / no tool turns.
  5. Inspect child spans (execute_tool get_current_weather, chat gpt-4o-mini) — they correctly carry gen_ai.tool.call.id, gen_ai.tool.call.result, gen_ai.tool.definitions, etc.

Observed wrapper attributes (12 total, no tool data)

_MS.ResourceAttributeId, _MS.GenAIContentId,
gen_ai.operation.name, gen_ai.agent.name, gen_ai.agent.id,
gen_ai.request.model, gen_ai.provider.name, gen_ai.request.choice.count,
gen_ai.usage.input_tokens, gen_ai.usage.output_tokens,
gen_ai.input.messages, gen_ai.output.messages

Expected behavior

On the invoke_agent <name> span the wrapper should aggregate from the child chat / execute_tool spans (or capture the same data directly) so that:

  • gen_ai.input.messages contains the full ordered chat history including every intermediate assistant-role message with tool_call parts and every tool-role message with tool_call_response parts.
  • gen_ai.output.messages reflects the final assistant choice(s), still per the output-messages JSON schema.
  • gen_ai.tool.definitions is populated with the tool schemas exposed to the agent.

Environment

  • microsoft-opentelemetry 1.3.0
  • Python 3.12, LangGraph create_react_agent
  • Backend exporter: Azure Monitor / Application Insights
  • Verified against semantic-conventions-genai main (≤ commit fcbb5dd, 2026-05-26)

Related

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions