Summary
microsoft-opentelemetry 1.3.0 emits the wrapper invoke_agent <agent-name> span with gen_ai.input.messages / gen_ai.output.messages in the new spec-compliant {role, parts:[{type, content}]} shape (great — that closed #159), but the message list only contains:
- the first user prompt, and
- the final assistant text reply.
It does not contain the intermediate assistant tool_call turns or the tool-role tool_call_response turns, and the span does not carry gen_ai.tool.definitions. All of that data is correctly emitted on child spans (execute_tool <name> and the inner chat <model> spans), but is never aggregated onto the wrapper invoke_agent span.
Why this matters
Consumers that read tool usage off the invoke_agent span see an empty tool-interaction history. Concrete impact: Azure AI Foundry's cloud trace-evaluation pipeline builds its tool_calls / tool_definitions per item from this span only — so:
builtin.task_adherence errors "no tool interactions provided" on every item.
builtin.intent_resolution mis-scores any prompt whose resolution depended on tool turns.
Spec gap
Per the OpenTelemetry GenAI semantic conventions (now in open-telemetry/semantic-conventions-genai), docs/gen-ai/gen-ai-agent-spans.md defines gen_ai.invoke_agent.internal (and .client) with this normative wording for gen_ai.input.messages:
Instrumentations MUST follow Input messages JSON schema. … Messages MUST be provided in the order they were sent to the model.
The schema's Role enum is system | user | assistant | tool, and ChatMessage.parts allows ToolCallRequestPart (type:"tool_call") and ToolCallResponsePart (type:"tool_call_response"). The spec's own example for gen_ai.input.messages on invoke_agent is exactly the missing pattern:
The spec also recognizes gen_ai.tool.definitions as an opt-in attribute on the invoke_agent span (Tool Definitions JSON schema).
Reproduction
- Wrap a LangGraph
create_react_agent with two tools (get_current_weather, get_forecast) using microsoft-opentelemetry==1.3.0.
- Send a prompt that forces multiple tool calls, e.g.
"Which is warmer right now: NYC, London, or Tokyo? Use the weather tool to check.".
- Query App Insights for the wrapper span:
dependencies
| where name startswith "invoke_agent "
| project customDimensions
- Observe
gen_ai.input.messages contains a single user message, gen_ai.output.messages contains a single assistant text, and no gen_ai.tool.definitions / no tool turns.
- Inspect child spans (
execute_tool get_current_weather, chat gpt-4o-mini) — they correctly carry gen_ai.tool.call.id, gen_ai.tool.call.result, gen_ai.tool.definitions, etc.
Observed wrapper attributes (12 total, no tool data)
_MS.ResourceAttributeId, _MS.GenAIContentId,
gen_ai.operation.name, gen_ai.agent.name, gen_ai.agent.id,
gen_ai.request.model, gen_ai.provider.name, gen_ai.request.choice.count,
gen_ai.usage.input_tokens, gen_ai.usage.output_tokens,
gen_ai.input.messages, gen_ai.output.messages
Expected behavior
On the invoke_agent <name> span the wrapper should aggregate from the child chat / execute_tool spans (or capture the same data directly) so that:
gen_ai.input.messages contains the full ordered chat history including every intermediate assistant-role message with tool_call parts and every tool-role message with tool_call_response parts.
gen_ai.output.messages reflects the final assistant choice(s), still per the output-messages JSON schema.
gen_ai.tool.definitions is populated with the tool schemas exposed to the agent.
Environment
microsoft-opentelemetry 1.3.0
- Python 3.12, LangGraph
create_react_agent
- Backend exporter: Azure Monitor / Application Insights
- Verified against
semantic-conventions-genai main (≤ commit fcbb5dd, 2026-05-26)
Related
Summary
microsoft-opentelemetry1.3.0 emits the wrapperinvoke_agent <agent-name>span withgen_ai.input.messages/gen_ai.output.messagesin the new spec-compliant{role, parts:[{type, content}]}shape (great — that closed #159), but the message list only contains:It does not contain the intermediate assistant
tool_callturns or thetool-roletool_call_responseturns, and the span does not carrygen_ai.tool.definitions. All of that data is correctly emitted on child spans (execute_tool <name>and the innerchat <model>spans), but is never aggregated onto the wrapperinvoke_agentspan.Why this matters
Consumers that read tool usage off the
invoke_agentspan see an empty tool-interaction history. Concrete impact: Azure AI Foundry's cloud trace-evaluation pipeline builds itstool_calls/tool_definitionsper item from this span only — so:builtin.task_adherenceerrors"no tool interactions provided"on every item.builtin.intent_resolutionmis-scores any prompt whose resolution depended on tool turns.Spec gap
Per the OpenTelemetry GenAI semantic conventions (now in
open-telemetry/semantic-conventions-genai),docs/gen-ai/gen-ai-agent-spans.mddefinesgen_ai.invoke_agent.internal(and.client) with this normative wording forgen_ai.input.messages:The schema's
Roleenum issystem | user | assistant | tool, andChatMessage.partsallowsToolCallRequestPart(type:"tool_call") andToolCallResponsePart(type:"tool_call_response"). The spec's own example forgen_ai.input.messagesoninvoke_agentis exactly the missing pattern:[ {"role": "user", "parts": [{"type": "text", "content": "Weather in Paris?"}]}, {"role": "assistant", "parts": [{ "type": "tool_call", "id": "call_VSPygqKTWdrhaFErNvMV18Yl", "name": "get_weather", "arguments": {"location": "Paris"} }]}, {"role": "tool", "parts": [{ "type": "tool_call_response", "id": "call_VSPygqKTWdrhaFErNvMV18Yl", "result": "rainy, 57°F" }]} ]The spec also recognizes
gen_ai.tool.definitionsas an opt-in attribute on theinvoke_agentspan (Tool Definitions JSON schema).Reproduction
create_react_agentwith two tools (get_current_weather,get_forecast) usingmicrosoft-opentelemetry==1.3.0."Which is warmer right now: NYC, London, or Tokyo? Use the weather tool to check.".gen_ai.input.messagescontains a single user message,gen_ai.output.messagescontains a single assistant text, and nogen_ai.tool.definitions/ no tool turns.execute_tool get_current_weather,chat gpt-4o-mini) — they correctly carrygen_ai.tool.call.id,gen_ai.tool.call.result,gen_ai.tool.definitions, etc.Observed wrapper attributes (12 total, no tool data)
Expected behavior
On the
invoke_agent <name>span the wrapper should aggregate from the childchat/execute_toolspans (or capture the same data directly) so that:gen_ai.input.messagescontains the full ordered chat history including every intermediateassistant-role message withtool_callparts and everytool-role message withtool_call_responseparts.gen_ai.output.messagesreflects the final assistant choice(s), still per the output-messages JSON schema.gen_ai.tool.definitionsis populated with the tool schemas exposed to the agent.Environment
microsoft-opentelemetry1.3.0create_react_agentsemantic-conventions-genaimain(≤ commit fcbb5dd, 2026-05-26)Related
gen_ai.tool.definitions.